skip to main content


Search for: All records

Creators/Authors contains: "Vasmatzis, George"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Bias in neural network model training datasets has been observed to decrease prediction accuracy for groups underrepresented in training data. Thus, investigating the composition of training datasets used in machine learning models with healthcare applications is vital to ensure equity. Two such machine learning models are NetMHCpan-4.1 and NetMHCIIpan-4.0, used to predict antigen binding scores to major histocompatibility complex class I and II molecules, respectively. As antigen presentation is a critical step in mounting the adaptive immune response, previous work has used these or similar predictions models in a broad array of applications, from explaining asymptomatic viral infection to cancer neoantigen prediction. However, these models have also been shown to be biased toward hydrophobic peptides, suggesting the network could also contain other sources of bias. Here, we report the composition of the networks’ training datasets are heavily biased toward European Caucasian individuals and against Asian and Pacific Islander individuals. We test the ability of NetMHCpan-4.1 and NetMHCpan-4.0 to distinguish true binders from randomly generated peptides on alleles not included in the training datasets. Unexpectedly, we fail to find evidence that the disparities in training data lead to a meaningful difference in prediction quality for alleles not present in the training data. We attempt to explain this result by mapping the HLA sequence space to determine the sequence diversity of the training dataset. Furthermore, we link the residues which have the greatest impact on NetMHCpan predictions to structural features for three alleles (HLA-A*34:01, HLA-C*04:03, HLA-DRB1*12:02).

     
    more » « less
    Free, publicly-accessible full text available January 16, 2025
  2. Abstract

    Fluorescence in situ hybridization (FISH) is the primary technology used to image and count mRNA in single cells, but applications of the technique are limited by photophysical shortcomings of organic dyes. Inorganic quantum dots (QDs) can overcome these problems but years of development have not yielded viable QD-FISH probes. Here we report that macromolecular size thresholds limit mRNA labeling in cells, and that a new generation of compact QDs produces accurate mRNA counts. Compared with dyes, compact QD probes provide exceptional photostability and more robust transcript quantification due to enhanced brightness. New spectrally engineered QDs also allow quantification of multiple distinct mRNA transcripts at the single-molecule level in individual cells. We expect that QD-FISH will particularly benefit high-resolution gene expression studies in three dimensional biological specimens for which quantification and multiplexing are major challenges.

     
    more » « less